AITopics | minimax optimal reinforcement learning

Collaborating Authors

minimax optimal reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs

Neural Information Processing SystemsDec-24-2025, 20:13:05 GMT

We study the reinforcement learning problem for discounted Markov Decision Processes (MDPs) under the tabular setting. We propose a model-based algorithm named UCBVI-$\gamma$, which is based on the \emph{optimism in the face of uncertainty principle} and the Bernstein-type bonus.

gamma, minimax optimal reinforcement learning, name change, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.32)

Add feedback

Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes

Neural Information Processing SystemsDec-24-2025, 19:52:11 GMT

We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs), which are MDPs with conditionally independent transition components. Assuming the factorization is known, we propose two model-based algorithms. The first one achieves minimax optimal regret guarantees for a rich class of factored structures, while the second one enjoys better computational complexity with a slightly worse regret. A key new ingredient of our algorithms is the design of a bonus term to guide exploration. We complement our algorithms by presenting several structure dependent lower bounds on regret for FMDPs that reveal the difficulty hiding in the intricacy of the structures.

factored markov decision process, minimax optimal reinforcement learning, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.33)

Add feedback

Review for NeurIPS paper: Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes

Neural Information Processing SystemsFeb-7-2025, 13:38:20 GMT

Additional Feedback: Response to author feedback: From the informal discussion about the cross-component counters, I'm getting that it's somehow bad if different components have been explored unevenly and therefore encouraging more balanced exploration (pairwise) reduces overall variance in the amount of exploration between components. I'm sure there's a lot I'm not getting, but that helps a bit. I think it should be the case that you recover an object when you multiply its factors together (for the appropriate definition of "multiply"). There are papers (well, just one I can think of) that deal with truly factored MDPs that are the product of simpler MDPs. They correctly call their MDPs factored.

factored markov decision process, minimax optimal reinforcement learning, probability, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.43)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Review for NeurIPS paper: Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes

Neural Information Processing SystemsFeb-7-2025, 13:38:13 GMT

While this paper initially had some mild divergence of opinion among the reviewers, after the author response and some detailed discussion, it was agreed that this paper makes a solid contribution (please see the revised reviews). It is certainly is of relevance to NeuRIPS. After discussion, there was agreement on the significance of the conceptual contribution, namely the treatment of the cross-component bonuses. Several reviewers note that the mathematics is fairly "standard" (Bernstein-bound machinery), though in the end that should not be considered a drawback. At least one reviewer notes that the 31pp appendix means that it is not possible to verify the mathematical results during the review period.

factored markov decision process, minimax optimal reinforcement learning, neurips paper, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs

Neural Information Processing SystemsJan-18-2025, 22:11:42 GMT

We study the reinforcement learning problem for discounted Markov Decision Processes (MDPs) under the tabular setting. We propose a model-based algorithm named UCBVI- \gamma, which is based on the \emph{optimism in the face of uncertainty principle} and the Bernstein-type bonus. We show that UCBVI- \gamma achieves an \tilde{O}\big({\sqrt{SAT}}/{(1-\gamma) {1.5}}\big) regret, where S is the number of states, A is the number of actions, \gamma is the discount factor and T is the number of steps. In addition, we construct a class of hard MDPs and show that for any algorithm, the expected regret is at least \tilde{\Omega}\big({\sqrt{SAT}}/{(1-\gamma) {1.5}}\big) . Our upper bound matches the minimax lower bound up to logarithmic factors, which suggests that UCBVI- \gamma is nearly minimax optimal for discounted MDPs.

gamma, minimax optimal reinforcement learning, ucbvi-gamma, (4 more...)

Neural Information Processing Systems

Industry: Education > Focused Education > Special Education (0.30)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes

Neural Information Processing SystemsOct-11-2024, 15:46:25 GMT

algorithm, factored markov decision process, minimax optimal reinforcement learning, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes

He, Jiafan, Zhao, Heyang, Zhou, Dongruo, Gu, Quanquan

arXiv.org Machine LearningNov-3-2023

We study reinforcement learning (RL) with linear function approximation. For episodic time-inhomogeneous linear Markov decision processes (linear MDPs) whose transition probability can be parameterized as a linear function of a given feature mapping, we propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $\tilde O(d\sqrt{H^3K})$, where $d$ is the dimension of the feature mapping, $H$ is the planning horizon, and $K$ is the number of episodes. Our algorithm is based on a weighted linear regression scheme with a carefully designed weight, which depends on a new variance estimator that (1) directly estimates the variance of the optimal value function, (2) monotonically decreases with respect to the number of episodes to ensure a better estimation accuracy, and (3) uses a rare-switching policy to update the value function estimator to control the complexity of the estimated value function class. Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.

inequality hold, machine learning, reinforcement learning, (12 more...)

arXiv.org Machine Learning

2212.06132

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.62)

Add feedback

Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation

Hu, Pihe, Chen, Yu, Huang, Longbo

arXiv.org Artificial IntelligenceJan-29-2023

We study reinforcement learning with linear function approximation where the transition probability and reward functions are linear with respect to a feature mapping $\boldsymbol{\phi}(s,a)$. Specifically, we consider the episodic inhomogeneous linear Markov Decision Process (MDP), and propose a novel computation-efficient algorithm, LSVI-UCB$^+$, which achieves an $\widetilde{O}(Hd\sqrt{T})$ regret bound where $H$ is the episode length, $d$ is the feature dimension, and $T$ is the number of steps. LSVI-UCB$^+$ builds on weighted ridge regression and upper confidence value iteration with a Bernstein-type exploration bonus. Our statistical results are obtained with novel analytical tools, including a new Bernstein self-normalized bound with conservatism on elliptical potentials, and refined analysis of the correction term. This is a minimax optimal algorithm for linear MDPs up to logarithmic factors, which closes the $\sqrt{Hd}$ gap between the upper bound of $\widetilde{O}(\sqrt{H^3d^3T})$ in (Jin et al., 2020) and lower bound of $\Omega(Hd\sqrt{T})$ for linear MDPs.

inequality hold, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2206.11489

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Beijing > Beijing (0.04)
North America > United States > Washington > King County > Seattle (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.65)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.62)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.62)

Add feedback